Unexpected cross-species contamination in genome sequencing projects

نویسندگان

  • Samier Merchant
  • Derrick E. Wood
  • Steven L. Salzberg
چکیده

The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paging through history: parchment as a reservoir of ancient DNA for next generation sequencing

Parchment represents an invaluable cultural reservoir. Retrieving an additional layer of information from these abundant, dated livestock-skins via the use of ancient DNA (aDNA) sequencing has been mooted by a number of researchers. However, prior PCR-based work has indicated that this may be challenged by cross-individual and cross-species contamination, perhaps from the bulk parchment prepara...

متن کامل

Here, there, and everywhere

T he main hurdle for genome sequencing projects these days is no longer the effort and cost of generating sequence data—which has become exponentially cheaper—but the capacity to analyse huge amounts of data and make sense of it. This endeavour is made harder by another problem that has begun to emerge over the past years: DNA contamination. Contamination impacts both sequence data generation, ...

متن کامل

Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data

Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species. I used reads that preferentially aligned to alte...

متن کامل

Efficient cross-species capture hybridization and next-generation sequencing of mitochondrial genomes from noninvasively sampled museum specimens.

The ability to uncover the phylogenetic history of recently extinct species and other species known only from archived museum material has rapidly improved due to the reduced cost and increased sequence capacity of next-generation sequencing technologies. One limitation of these approaches is the difficulty of isolating and sequencing large, orthologous DNA regions across multiple divergent spe...

متن کامل

A simple protocol to obtain highly pure Wolbachia endosymbiont DNA for genome sequencing.

Most genome sequencing projects using intracellular bacteria face difficulties in obtaining sufficient bacterial DNA free of host contamination. We have developed a simple and rapid protocol to isolate endosymbiont DNA virtually free from fly and mosquito host DNA. We purified DNA from six Wolbachia strains in preparation for genome sequencing using this method, and achieved up to 97% pure Wolb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2014